Mammoth and Elephant Phylogenetic Relationships: Mammut Americanum, the Missing Outgroup

At the morphological level, the woolly mammoth has most often been considered as the sister-species of Asian elephants, but at the DNA level, different studies have found support for proximity with African elephants. Recent reports have increased the available sequence data and apparently solved the discrepancy, finding mammoths to be most closely related to Asian elephants. However, we demonstrate here that the three competing topologies have similar likelihood, bayesian and parsimony supports. The analysis further suggests the inadequacy of using Sirenia or Hyracoidea as outgroups. We therefore argue that orthologous sequences from the extinct American mastodon will be required to definitively solve this long-standing question.


Introduction
The mammoth lineage offers one of the most complete palaeontological records among vertebrates (Lister and Sher, 2001). Large sampling and dating evidence have contributed to set morphological adaptive changes in a precise geographical and temporal framework (Lister et al. 2005). The elephant and mammoth lineages probably diverged some 4-6 million years ago (MYA) in Africa (Shoshani and Tassy, 2005;Todd, 2006), but it is around 3 MYA that mammoths spread across the temperate and wooded habitats from Europe to China (Lister et al. 2005). Populations from China and Northern Siberia, adapted to cold and steppe conditions, progressively supplanted older forms (Lister and Sher, 2001). By 200 KYA, the woolly mammoth stage (Mammuthus primigenius) was reached in Northern Siberia and started to spread westwards to Europe. It spread later eastwards across the Beringia into Northern America, where descendants of ancestral forms adapted to the temperate grasslands already lived, Mammuthus columbi as well as pigmy mammoths (Mammuthus exilis) (Agenbroad, 2005). The cooling from the end of the Last Ice Age considerably restricted their habitat and precipitated their extinction; at the beginning of the Holocene, mammoths only survived in small refugial islands from the Arctic and Bering Sea and by 3.7 KYA the very last specimen disappeared (Vartanyan et al. 1993;Guthrie et al. 2004;Stuart et al. 2004).
Several points in this impressively well documented model are still debated though. Among these, the tempo and mode for mammoth extinction, especially with regards to possible overkilling by hunters, is perhaps the most controversial issue (Agenbroad, 2005;Stuart, 2005). But the question of the origin of the mammoth lineage has also received much attention in the last decade. Palaeontologists have long found support in morphological characters for a sister group relationship between mammoths and Asian elephants (Elephas), rather than African elephants (Loxodonta) (Maglio, 1973). This model has received additional support from the analysis of new characters, such as the hyoid apparatus (an association of nine bones connected to the cranium, the tongue and the larynx) (Shoshani and Marchant, 2001). Surprisingly, the very fi rst mammoth DNA sequence exhibited minimum genetic distance with extant African but not Asian elephants (Hagelberg et al. 1994; Table 1). Some larger sequence datasets grouped mammoth and Elephas (Yang et al. 1996;Ozawa et al. 1997; Table 1). But reanalysis of these data found again a grouping of Mammuthus and Loxodonta. These results called into question (i) the validity of morphological synapomorphies between mammoths and Asian elephants (Noro et al. 1998;Thomas et al. 2000;Debruyne et al. 2003), and (ii) the authenticity of some of the previously reported sequences (Yang et al. 1996;Thomas et al. 2000;Debruyne et al. 2003). Most recently, partial nuclear gene sequences (Capelli et al. 2006) and complete mitochondrial genomes (Krause et al. 2006;Rogaev et al. 2006) have revived the debate, showing support for the (Mammuthus, Elephas) clade. Despite some claims to the contrary, Mammuthus affinities are still far from being conclusively settled, with different topologies supported by Rogaev et al. (2006), Krause et al. (2006), and Capelli et al. (2006; summarized in Table 1). In this study, we evaluate for the fi rst time the phylogenetic signal contained in all the data, by combining all the available nuclear and mitochondrial genes.

Data construction
Mitochondrial sequences were retrieved from Genbank and manually aligned using the Seaview software (Galtier et al. 1996). An alignment of the nuclear sequences available for elephantids was kindly provided by A.D. Greenwood. The complete data sets as a whole and 16 different partitions were further analyzed (Table 2).

Most likely topologies and number of synapomorphies
For all data sets and partition subsets we estimated the most likely topologies in favor of (Mammuthus, Elephas), (Mammuth, Loxodonta) and (Elephas, Loxodonta) clustering. All computations were done using PAUP (Swofford, 2002) and the best fi tted model according to Akaike criterion (Akaike, 1974) as implemented in Modeltest (Posada and Crandall, 1998). Approximatively Unbiased (Shimodaira, 2002), Kishino-Hasegawa (1989) and Shimodaira-Hasegawa (1999) tests were done using CONSEL (Shimodaira and Hasegawa, 2001). The number of synapomorphies for each of these alternatives were all collected via direct pairwise comparisons and possible signifi cant differences were evaluated using a Chi-square test.

Partitioned bayesian inferences and partition contents
In order to test for phylogenetic support in a partitioned Bayesian framework we analyzed two partition schemes using MRBAYES v3.1.2. Both analyses employed a GTR model of evolution assuming a fraction of invariant sites and a rate heterogeneity across sites. For each, two sets of four chains sampled every 100 generations were ran until the average standard deviation of split frequencies between the two set fell below the default critical value of 0.01 using a burn-in fraction of 25%. To ensure that consensus trees were based on a rather large collection of trees, average standard deviation of split frequencies were only evaluated every 100000 generations. Our fi rst partition scheme was designed to ensure that sites under different evolutionary dynamic received independent evaluation. Thus we defi ned 10 partitions after one for each nuclear codon position, one for each mitochondrial codon positions, one for mitochondrial ribosomal RNA positions, one for mitochondrial transfer RNA positions, one for nuclear non coding positions and one for mitochondrial non coding positions. Our second scheme allowed independent estimation of model parameters for each gene or gene fragment. Mitochondrial non coding positions (to the exception of the D-loop positions) were all lump together, leading to a total of 46 partitions). These schemes are referred as P10 and P46 respectively. Finally the information content of the different genes were estimated using the hidden branch support approach described in Gatesy et al. (1999). These  Table 2. Likelihood values and number of substitutions in favor of each topology.  estimations were done both in a parsimony and likelihood framework using PAUP (Swofford, 2002).

Results/Discussion
We fi rst started by computing the likelihood of the three alternative topologies under the best-fi tting model of molecular evolution using different sets of sequences (  Table 2). Interestingly none of the tests performed on complete mitochondrial genome data sets was able to corroborate Rogaev et al. (2006) reports of significant support for the Mammuthus-Elephas clade. Noteworthy, Rogaev et al. (2006) did not provide any details on how likelihood ratio tests were performed to discriminate between alternative topologies (while this procedure is still unknown to most phylogeneticists; see Felsenstein, 2004 for a discussion of the inadequacy of likelihood ratio in testing alternative topologies). We then decided to count the total number of synapomorphies of the three possible pairs of taxa, using the state of Hyrax and Dugong sequences to polarize character changes (Table 2). Merging all data, none of the three pairs exhibits signifi cant deviation from the mean number of synapomorphies (Chi-square, p-value = 0.774), suggesting similar parsimony support for the three alternatives. All but ribosomal RNA (p = 0.004) partitions of the data yielded non-signifi cant deviations as well (0.158 < p-value < 0.819). Such a pattern might be indicative either of lineage-sorting effects among mammoths (one gene leading to a fi rst phylogenetic signature whereas another one to the opposite), of differential parallel or convergent evolution in some genes, or of poor polarization of characters. Indeed, Hyrax and Dugong have diverged from the elephant lineage about 65 MYA.
In theory, Bayesian analyses reported in Rogaev et al. (2006) could support the lineage-sorting hypothesis (Table 1) since different topologies were supported by different genes. Since mitochondria are mostly non-recombinant in animal (e.g. Barr, Neiman and Taylor, 2005), this explanation is unlikely. Furthermore, while the Mammuthus-Elephas relationship was supported by both partitioned Bayesian analyses (p = 0.931 and 0.751 for P46 and P10 respectively), the Mammuthus-Loxodonta and Elephas-Loxodonta (P10 only) alternatives were both present in the 95% credibility interval. Similarly, none of the three possible pairs of taxa exhibits signifi cant differences in the number of synapomorphies ( Figure 1). Moreover, whatever the framework or alternatives, hidden branch support is systematically detected in our data set. Such results are rather unexpected since alternatives are mutually incompatible. One possible explanation for the strong and incompatible Bayesian posterior probabilities reported by Rogaev et al. (2006) would imply a hard polytomy between Mammuthus-Loxodonta and Elephas. Lewis and co-workers (2005) have indeed convincingly demonstrated that standard MCMC procedure tends to become unpredictable, including strong shift from one to another alternative, when the true phylogeny is a hard or near hard polytomy (but see Kolaczkowski and Thornton, 2006). Alternatively, overall results could be due to inadequate polarization of the data. This fi nding is corroborated in littera since different rooting procedures (or phylogenetic methods) often come to opposite conclusions ( Table 1).
In that context, one possibility would be to build unrooted trees and check the stability of molecular clock along the branches, to infer the most probable midpoint position of the root. This strategy has already been followed in several studies (Table 1) but leaves no possibility to test if the resulting topology is better than the alternatives. We thus consider tree rooting as a prerequisite before drawing any defi nite conclusion with regards to the phylogenetic relationships of mammoths and elephants and possible hard polytomy. The American mastodon (Mammut americanum) lineage would be very useful for that purpose since it diverged from the lineage of mammoth and elephants about 24 MYA, that is 40 MY later than the Hyracoidea (e.g. Hyrax)-Sirenia (e.g. Dugong) -Proboscidia (elephantids) split (Shoshani and Tassy, 1996). Moreover, the American mastodon (Mammut americanum) became extinct at the Late Glacial Maximum, and specimens from that time range are compatible with ancient DNA recovery (Gilbert et al. 2005). Such a mastodon DNA sequence (circa 10 KYA) was used in Yang et al. (1996), but several authors have raised concerns on the authenticity of the mammoth sequences recovered (Thomas et al. 2000;Debruyne et al. 2003).
Finally, we should bear in mind that star trees are the null hypothesis in phylogenetic reconstructions. Thus if the adaptive radiation among elephantids occurred very rapidly (ca. 500 KY) as recently suggested (Krause et al. 2006), this near hard polytomy might require extensive data to be solved. One fruitful strategy would most probably be to take advantage of the ongoing Loxodonta genome project and of the 13 Mbp of the mammoth genome already published (Poinar et al. 2006). In silico analyses might identify most-informative   ΣBS ind : sum of branch support (BS) scores for that node from each data partition. BS : difference in the number of character steps (or likelihood difference) between the best topology with and without that node. HBS (Hidden branch support) = BS-ΣBS ind . For further details see Gatesy et al. 1999. candidate genes and enable to design primers in order to collect orthologous sequences in both Elephas and mastodons. New technological advances in two-round multiplex-PCR (Krause et al. 2006;Römpler et al. 2006) and large-scale sequencing (Poinar et al. 2006), combined with the exceptionally-well preserved mammoth specimens from permafrost, make this an exceptional model for the genomic study of speciation.